NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Program Analysis for Adaptive Data Analysis

https://doi.org/10.1145/3656414

Liu, Jiawen; Qu, Weihao; Gaboardi, Marco; Garg, Deepak; Ullman, Jonathan (June 2024, Proceedings of the ACM on Programming Languages)

Data analyses are usually designed to identify some property of the population from which the data are drawn, generalizing beyond the specific data sample. For this reason, data analyses are often designed in a way that guarantees that they produce a low generalization error. That is, they are designed so that the result of a data analysis run on a sample data does not differ too much from the result one would achieve by running the analysis over the entire population. An adaptive data analysis can be seen as a process composed by multiple queries interrogating some data, where the choice of which query to run next may rely on the results of previous queries. The generalization error of each individual query/analysis can be controlled by using an array of well-established statistical techniques. However, when queries are arbitrarily composed, the different errors can propagate through the chain of different queries and bring to a high generalization error. To address this issue, data analysts are designing several techniques that not only guarantee bounds on the generalization errors of single queries, but that also guarantee bounds on the generalization error of the composed analyses. The choice of which of these techniques to use, often depends on the chain of queries that an adaptive data analysis can generate. In this work, we consider adaptive data analyses implemented as while-like programs and we design a program analysis which can help with identifying which technique to use to control their generalization errors. More specifically, we formalize the intuitive notion ofadaptivityas a quantitative property of programs. We do this because the adaptivity level of a data analysis is a key measure to choose the right technique. Based on this definition, we design a program analysis for soundly approximating this quantity. The program analysis generates a representation of the data analysis as a weighted dependency graph, where the weight is an upper bound on the number of times each variable can be reached, and uses a path search strategy to guarantee an upper bound on the adaptivity. We implement our program analysis and show that it can help to analyze the adaptivity of several concrete data analyses with different adaptivity structures.
more » « less
Full Text Available
Athena: high-performance sparse tensor contraction sequence on heterogeneous memory

https://doi.org/10.1145/3447818.3460355

Liu, Jiawen; Li, Dong; Gioiosa, Roberto; Li, Jiajia (June 2021, International Conference on Supercomputing (ICS))
null (Ed.)
Full Text Available
Sparta: high-performance, element-wise sparse tensor contraction on heterogeneous memory

https://doi.org/10.1145/3437801.3441581

Liu, Jiawen; Ren, Jie; Gioiosa, Roberto; Li, Dong; Li, Jiajia (February 2021, Principles and Practice of Parallel Programming)
null (Ed.)
Full Text Available
Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training

Liu, Jiawen; Li, Dong; Kestor, Gokcen; Vetter, Jeffrey (May 2019, Proceeding of International Parallel and Distributed Processing Symposium)

Full Text Available
Processing-in-Memory for Energy-Efficient Neural Network Training: A Heterogeneous Approach

https://doi.org/10.1109/MICRO.2018.00059

Liu, Jiawen; Zhao, Hengyu; Ogleari, Matheus A.; Li, Dong; Zhao, Jishen (October 2018, International Symposium on Microarchitecture)

Full Text Available
HOMP: Automated Distribution of Parallel Loops and Data in Highly Parallel Accelerator-Based Systems

https://doi.org/10.1109/IPDPS.2017.99

Yan, Yonghong; Liu, Jiawen; Cameron, Kirk W.; Umar, Mariam (May 2017, Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International)

Heterogeneous computing systems, e.g., those with accelerators than the host CPUs, offer the accelerated performance for a variety of workloads. However, most parallel programming models require platform dependent, time-consuming hand-tuning efforts for collectively using all the resources in a system to achieve efficient results. In this work, we explore the use of OpenMP parallel language extensions to empower users with the ability to design applications that automatically and simultaneously leverage CPUs and accelerators to further optimize use of available resources. We believe such automation will be key to ensuring codes adapt to increases in the number and diversity of accelerator resources for future computing systems. The proposed system combines language extensions to OpenMP, load-balancing algorithms and heuristics, and a runtime system for loop distribution across heterogeneous processing elements. We demonstrate the effectiveness of our automated approach to program on systems with multiple CPUs, GPUs, and MICs.
more » « less
Full Text Available

Search for: All records